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THE DAVIDOIM-FLETCHER-POWELL PENALTY FUNCTION METHOD: 
A GENERALIZED ITERATIVE TECHNIQUE FOR SOLVING 
PARAMETER OPTIMIZATION PROBLEMS 

By Ivan L. Johnson, Jr. 

Lyndon B. Johnson Space Center 

SUMMARY 


The Davidon-Fletcher-Powell penalty function method is a technique that has 
been used successfully to solve constrained minimization problems . The method 
was devised by combining an exterior penalty function with a performance function 
for solving constrained minimization problems . 

Constrained minimization problems include parameter optimization, optimal 
control , boundary values , and iterative problems . In addition to relating the ex- 
perience obtained while observing the technique, this report addresses the success- 
ful application of the Davidon-Fletcher-Powell penalty function technique in combi- 
nation with double-precision arithmetic for solving atmospheric and exoatmospheric 
flight optimization problems . 


INTRODUCTION 


One of the most significant advances in numerical optimization in the last 5 
years has been the use of finite difference arithmetic to determine partial deriva- 
tives for numerical gradient optimization iterative methods . This single achieve- 
ment has substantially increased the level of problem statement complexity that can 
be subjected to numerical optimization methods. The purpose of this report is to 
describe one such iterative method, the Fletcher-Powell version of the Davidon 
variable metric unconstrained minimization technique , using finite difference gra- 
dient information together with an exterior penalty function technique for solving 
constrained minimization problems . 

The Davidon-Fletcher-Powell (DFP) method was combined with an exterior 
penalty function method to provide the capability for solving constrained minimiza- 
tion problems (ref. 1) . Subsequently, a computer program of the DFP penalty func- 
tion method was developed and has been used continuously to solve many types of 
parameter optimization problems . 

One of the first types of problems solved was an impulsive orbit transfer opti- 
mization problem (ref. 2) wherein the components of the velocity impulses and the 
transfer angles were defined as variables or parameters. Other types of problems 
solved were in the areas of optimal nuclear reactor design, optimal aircraft control 



systems design (ref. 3) , optimal feature selection (Earth resources) , curve fitting 
data, solving systems of equations, optimal electronics problems (radar), and at- 
mospheric and exoatmo spheric flight optimization^ ^ (refs . 4 to 7) . With the ex- 

ception of the last two problems , the DFP penalty function method was used with 
closed-formed gradients , or partial derivatives , on all these problems . The optimal 
electronics problem involving a simplified design of a radar system was solved by 
using numerical , central , and forward difference gradients computations . Before 
this problem was solved, the accuracy of the DFP method in using numerical partial 
derivatives was questionable , even though the optimal electronics problem was a 
simple problem with only seven parameters . The atmospheric and exoatmospheric 
flight optimization problems were also solved by using numerical gradients , but 
some of these problems involved as many as 60 parameters and included trajectories 
integrated over thousands of seconds. However, the DFP penalty function method 
apparently worked almost as well with numerical gradients as with closed-form 
gradients . 

In solving atmospheric flight optimization problems , many techniques were 
used to satisfy state and control inequality constraints . One technique used , when- 
ever possible , was that of solving for the control explicitly from the constraint equa- 
tion when the constraint was violated. However, a different technique, an error 
function method similar to the integral penalty function method, was used when this 
was not possible. This method provides a better way of shaping a control and satis- 

0 

fying state and control inequality constraints (ref. 8) . 

A tried and proven computer program of the DFP method was devised by the 
Mathematical Analysis Section of the Software Development Branch at the NASA 
Lyndon B . Johnson Space Center . With this program , the user can select either 
closed-form gradients, central difference gradients, or forward difference gra- 
dients . The one-dimensional search that is used by the DFP method offers three 


^I. L. Johnson and J. L. Kamm, "Performance Analysis of 049A/SRM Shuttle 
for Maximum GLOW," JSC IN 72-FM-202, Sept. 1972. 
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J. L. Kamm and I. L. Johnson, "An Optimal Solid Rocket Motor Thrust Pro- 
file to Minimize 040A Shuttle GLOW," JSC IN 72-FM-159, June 1972. 

^I. L. Johnson, "A Minimum Sized Space Shuttle Configuration," JSC IN 72- 
FM-161, June 1972. 

^I. L. Johnson, "Minimum Fuel Deorbit Using PEACE Program," JSC IN 72- 
FM-136, May 1972. 
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I. L. Johnson, "Sizing a Siamese Lifting Body Triamese Tank Ascent Config- 
uration," JSC IN 72-FM-259, Nov. 1972. 

I . L . Johnson , "The Error Function Method for Handling Inequality Con- 
str nts," JSC IN 73-FM-114, July 1973. 
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options: cubic fit using the slope at the initial point, cubic fit without the slope, 

7 8 • 

and the golden section method (refs. 9 and 10 ) . Interfacing this program with 

another program is both simple and flexible . A version of this program is in the 

space vehicle dynamics simulation (SVDS) program and is currently in the checkout 

phase. The DFP penalty function program, which can be obtained from the author, 

is sometimes referred to as the parameter optimization program , the accelerated 

gradient program , or the Davidon program . 


SYMBOLS 


a^ , a , ag , Sg polynomial coefficients 
a. polynomial coefficient 


f(x) 

g(x) ,h(x) 

H 

I 

n 

P 

P(x) 


P 


XX 


s. 

1 

U(h) 

U(hj) 

V 


general performance function 
vectors of point constraint functions 
positive -definite symmetric matrix 
identity matrix 
diagonal weight matrices 

number of variables or parameters 
penalty and quadratic function 
penalty and quadratic function of x 
gradient of p(x) 

matrix of second partials with respect to x 

direction of search on i-th iteration 

diagonal matrix of unit step functions 

diagonal elements of U (h) 

speed 


7 

I. L. Johnson and G. E. Meyers, "One-Dimensional Minimization Using 
Search by Golden Section and Cubic Fit Methods," JSC IN 67-FM-172, Nov. 1967. 
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The correction to this paper can be obtained from the author . 
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X n-dimensional parameter vector 

y. difference in gradient vectors for i-th iteration 

Y one-dimensional search parameter 

y. value of y the i-th iteration for the minimum of p (x. + ys^) 

e small positive number 

A, Lagrange multiplier 


Subscripts: 

f final value 

LB lower bound 

X partial derivative with respect to vector x 


Operators: 

T transpose 

-1 inverse 

at minimum and desired value 
' ordinary derivative 

GENERAL DESCRIPTION 


The constrained minimization problem , or nonlinear programing problem , is 
defined as finding the n dimensional parameter vector x that locally minimizes 
the general performance function f(x) and that satisfies the vectors of point con- 
straints g(x) = 0 and h(x) <0. A popular method for solving this problem is the 
gradient projection method. This method was initiated by Rosen (refs. 11 and 12) 
and later improved by Kelley and Speyer (ref. 13) and others. It requires larger 
amounts of computer storage than others, and it requires the computation of an in- 
verse of a matrix . Because of linearization assumptions , this method may also re- 
quire small step sizes. The user must be knowledgeable in selecting numerous 
constants associated with the gradient projection method because the transition be- 
tween performance function minimization and constraint satisfaction must be proper- 
ly balanced. 
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The penalty function method provides an alternate approach for solving the 
constrained minimization problem by solving an unconstrained minimization prob- 
lem. The performance function together with a term that "penalizes" it for viola- 
tions in the constraints is minimized. Penalty functions are divided into two classes: 
interior and exterior . 

The interior penalty function method by Fiacco and McCormick (refs . 14 and 
15) , which was popularized by Brusch (ref. 16) and others, is probably the best 
interior method currently known . Although performance using this method is good , 
an initial parameter vector that satisfies the point inequality constraints is needed . 

An exterior penalty function method (refs. 1, 17, and 18) allows initial pa- 
rameter vectors to cause violations in inequality constraints . This method has low 
computer storage requirements and a smooth, continuous transition between per- 
formance function minimization and constraint satisfaction. The exterior method, 
as well as the interior method , is minimized by an unconstrained minimization 
method . 

A survey of the literature reveals many promising unconstrained minimiza- 
tion methods derived by using various assumptions. Probably the most basic method 
is sectioning. This method simply produces a direction of search, by changing 
only one parameter per iteration . Although sectioning does not require the compu- 
tation of costly gradient vectors , it requires a very large number of function compu- 
tations and small steps . 

The method of steepest descents (ref. 1) , with a direction of search essen- 
tially in the direction of the negative gradient vector, is based on linear assumptions 
and thus requires small steps. Convergence is slow with the steepest descent 
method, but it is faster than sectioning. The pattern search can be used to speed 
up this method and others by periodically forming a direction of search with a vec- 
tor that is the resultant of k steps . Linear assumptions are still a disadvantage 
in using the method of steepest descents. 

Several methods derived by assuming that the function to be minimized is 
quadratic take steps in directions that are conjugate with respect to the matrix of 
second partial derivatives . For a quadratic function of n variables, if the steps 
are found by performing successive , one-dimensional minimizations in these con- 
jugate directions , then convergence to the minimum is obtained within roundoff in , 
at most, n steps (ref. 19) . When applied to a nonquadratic function, the function 
to be minimized appears to be quadratic as the minimum is approached . (The 
second-order terms in the Taylor series expansion of the function dominate because 
the first-order terms are vanishing.) These methods accelerate convergence, 
whereas methods based on linear assumptions , such as steepest descent , tend to 
slow it down . 

Two of these methods , the conjugate gradient method of Fletcher and Reeves 
(ref. 19) and a method devised by W. C. Davidon (ref. 20) that was later modified 
by Fletcher and Powell (ref. 21), are currently the most popular . Compared to 
Davidon' s method, the conjugate gradient method requires less computer storage 
and much simpler computation. In reference 22, however, Davidon' s method fea- 
tured much faster convergence for nonquadratic functions . 
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Because of the advantages, the Davidon-Fletcher-Powell method is used as 
the unconstrained minimization method in conjunction with an exterior penalty func- 
tion for solving the highly nonlinear constrained minimization problems defined by 
optimization problems . 


CONSTITUENT ELEMENTS 
DFP Method 

The DFP method is one of the most powerful iterative methods known for mini- 
mizing a general unconstrained function of n variables or parameters . This itera- 
tive method was derived by W. C. Davidon (ref. 20) and later modified and clarified 
by R. Fletcher and M. J. D . Powell (ref. 21). It is one of many derived by con- 
sidering that the function to be minimized is a quadratic function . 

Overall procedure .- Iterative minimization methods determine the position of 
a minimum x of a function as the limit of a sequence of successive positions x^, 

x^ , These positions are obtained by minimizing the function along a line through 

the previous position in a specified direction of search s^ obtained by some specified 

rationale . For a quadratic function p (x) with a regular minimum located at the 
position X, 


p(x) = p(x) + ^(x 




( 1 ) 


In reference 19, it was shown that if a sequence of nonzero directions of search 
Sq , Sf , S 2 , . . . is p^^ orthogonal or conjugate , that is , 


T 

s. p s. = 0 for i ^ j 
1 ^xx j 

> 0 otherwise (2) 


and if at the end of each search p(x^ + ys.) is minimized with respect to the one- 
dimensional search parameter y , that is , 




= 0 


(3) 


then the minimum of p is obtained in , at most , n iterations . An iterative minimi- 
zation method that minimizes a quadratic function in a finite number of iterations, 
usually n, is said to have quadratic convergence (ref. 23) . 
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For a nonquadratic function p (x) , a Taylor series expansion about the posi- 
tion of the minimum x is given by 


p(x) = p(x) + i(x - x)'^p^^ 


(x) (x - x) + higher order terms 


(4) 


From equation (4) , p (x) behaves like a quadratic function near the minimum 
for all X in a small neighborhood about x: | | x - x 1 1 < s , where s is a sufficient- 
ly small positive number. For values of x in this neighborhood, quadratically con- 
vergent, iterative methods will accelerate convergence to the minimum of the general 
function . 

If given a position x^ for a quadratic function p , the position of its minimum 
X can be found in one iteration by x = Xq + s^ where 


“o ' -Pxx'^Px^o) 


and p ^ is assumed to exist . 

XX 

For a general function p beginning at the point x^, the position of the mini- 
mum X is found as the limit of the sequence x_, x^, . . . , x; each x. is obtained 
by applying ~ ^i ^ ^i®i where ^ 


s. 

1 




( 6 ) 


Equations (5) and (6) , sometimes referred to as Newton's method, require the com- 
putation of p and its inverse. For a general function, computations of p ^ 
would be performed at each point x^. This procedure often would require exces- 
sive time because p would probably be obtained by a numerical method rather 

XX 

than in closed form. Also, p^^ must be inverted, but this requirement may not be 
achievable because there is no guarantee that p is nonsingular for all points x. 
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Instead of computing p , a positive-definite symmetric matrix H is com- 
puted iteratively in the DFP method such that a direction of search is given by 


s. = -H.p 
1 rx 


(’=1) 


(7) 


For a quadratic function , H converges to p and for a general function , H 

1 XX 

tends to p evaluated at the minimum . 

^xx 

The DFP method begins with an initial point x^, an initial matrix (some- 
times selected as the identity matrix I) , p(Xq) , and the gradient vector Pjj.(Xq) . 

During the i-th iteration , the direction of search is computed from equation 
(7) , and p(x^ + ys^) is minimized as a function of y, a positive scalar step-size 

parameter . The position of the minimum is defined as 


^i+1 


= X. + y.s. 
1 'll 


(8) 


where y^ is the value of y on the i-th iteration for the minimum of p (x^ + ysp 

as a function of y. At the new position the gradient p (Xj^j) is computed, 

and H is changed according to ^ ^ 


H. 


i+1 




T T 

s.s. H.y.y. H. 

11 !•' l-' 1 1 


s. y. y. H.y. 

1^1 j j 


(9) 


where 


n = Pxh*l) ■ Px^i) 

The iterative process is then repeated until the minimum is reached. 

The DFP method has very fast convergence for nonquadratic as well as quad- 
ratic functions if the gradient vector p and the one-dimensional search of mini- 

X 

mizing p (x^ + ysp as a function of y are computed accurately . The success of 
the one-dimensional search and the accuracy of the numerical gradient p are de- 

X 

pendent on the consistency and accuracy with which p (x) is computed . 
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One-dimensional search . - Obtaining the one-dimensional minimum of 
p (x. + Ysj) as a function of y is a critical process for the DFP method . If the one- 
dimensional minimum is obtained precisely, then equation (3) will be satisfied, and 
fast , near quadratic convergence of the DFP method will be realized . However , for 
complicated functions that are encountered, zero for equation (3) is almost impos- 
sible. Numerically, it has been found that, even with some error in computation of 
the one-dimensional minimum , the DFP method will yield near quadratic converg- 
ence . The magnitude of this error will be discussed further in this report . It is 
important, however, that the one-dimensional minimum be efficiently found within 
this tolerable error . 

In the literature, several approaches are described that require computations 
of dp(Y)/dy, as well as p(Y)> to locate the minimum of p(y) (refs. 19 to 21). 
Computations of dp/dy require the computation of p (y) because 


dy 


= P^(Y)’^Sj 


( 11 ) 


The time required for computing p (y) prohibits its use in a one-dimensional 
search. The one-dimensional search described here uses values of p(y) only. 

The one-dimensional search technique features three options: cubic fit using 

the slope at the initial point (ref. 10) , cubic fit without the slope, and the golden 

7 

section method (ref. 9) . 

The golden section method is a single-variable, sequential search method 
used primarily as a starter for the cubic fit method without the slope and as a back- 
up to both cubic fit methods . First, a search is conducted to find a region of uni- 

11 1 

modality providing four points: [y^ ’ ^ j = 1, 2, 3, 4 where y^^ = 0 and 

[Y2^» . P(Y 3 ^)] . and [Y 4 ^.P(Y 4 ^)] are such that 


p(y2^) < P(0) 
p(y3’-) < p(y4^) 


1 1 Vs - 1 

Ys "^4 — ^ 

1 1 3 - Vs 

^2 =V4 


( 12 ) 


7 

I. L. Johnson and G. E. Meyers, "One-Dimensional Minimization Using 
Search by Golden Section and Cubic Fit Methods," JSC IN 67-FM-172 , Nov. 1967 . 
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An example of this geometrical arrangement is shown in the following diagram . 

p(r) 

dp(y = 0) 
dy 


I y 

7 

The golden section method (ref. 9) then proceeds as follows: p(y„ ) and p(y„ ) 

11 ^ J 

are compared. If P(Y 2 ) < ^ shown), then 

r / = Y3^ . p(y4^) = p(y3^) 

Yj^ = Yj' . p(y3^) = p(y2^) 

y/ = Yi^ . p(yi^) = p(yi^) 

.. 2 _.. 

Y2 'Yi + 2 


7 ' ... 

I. L. Johnson and G. E. Meyers, "One-Dimensional Minimization Using 

Search by Golden Section and Cubic Fit Methods," JSC IN 67-FM-172, Nov. 1967. 
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2 11 

and only one additional evaluation of p is made: p(y„ ) . If p(y, ) < p(yo ) » 
then ^ 3 2 


. p(yi^) = p(v2^) 

y ^ = . p(y2^) = p(y3^) 

y/ = Y4^ .p(y/) = p(y/) 

..2_ 2 . (^ 4 ' - 

^3 f 1 2 


2 

£ind pCy^ ) is calculated. This procedure is repeated until on one iteration (e.g., 
the j-th) , |yg^ - y^^ | < where e is a prescribed convergence tolerance. The 

minimum y^ is computed from 


Yi = 



(15) 


For the cubic fit method without the slope, the four points from equation (12) 
are retained and are used to begin a search for the minimum of p using a sequence 
of third-degree polynomials approximating p: 


p(y) - ^0 ^ ^l"^ 


2 ^ 

)Y + a. 


(16) 


The constants a^ , a^ , a^, and a^ are obtained by applying the four points to 

equation (16) and solving the four equations simultaneously for them. From equa- 
tion (16) , the minimums are located from the equation 


-a 


1 



Saiag 


(17) 
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During the search, as a new minimum is found, it is compared to the four points 
used in the fit; and the point having the largest value of p among the five is dis- 
carded. For successive minimums y- y. ,,,if 




^j+1’ 


lYj^.1 - Yjl < EYj 


(18) 


(s being a prescribed convergence tolerance) , the search is terminated and 



For the cubic fit method using the slope 


dy 


(y = 0) E p'(0) 


(19) 


in the curve fit , the search procedure begins on the assumption that the function p 
is quadratic and that its minimum value is approximated by a lower bound Pj^g . 

With this information, the first approximation to the minimum (ref. 10) is 


2[Plb - PW)] 
^1 P’(0) 


( 20 ) 


The next approximation is also quadratic, using p(0), p' (0) , and p(y^) to cal- 
culate the curve fit coefficients . The minimum is approximated by 


'^2 



(21) 


1 2 

where p(y) - next approximation is a cubic fit, and the 

minimum is found by 


^3 


-2a, 


^Ja 


^2 V^2" “ 2aia3 


( 22 ) 


12 13 

where p(y) = a^ + a^y + ( 2 )^ 2 '^ ^0’ ^1’ ^ 2 ’ ^3 are found 

by using the data points p(0), p'(0), p(y^),and p(y 2 )- The previously 
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generated minimums and y^ are compared as in equation (18) . If the test in 

equation (18) is not passed, p' (0) is dropped as a data point, and P(Y 3 ) added; 
and the search will continue as the cubic fit method without the slope p' (0) . 

If the cubic fit method fails without using p' (0) , then the search by golden 
section is used as a backup and initiated with the four points that were retained . 
Because the golden section method is a much slower converging method than the 
cubic fit method , it is used primarily as a starter and as a backup . 

The speed in finding the first four points in equation (12) is controlled by 
the selection of Y2 initially . A prescribed constant value is one selection . Another 

more efficient selection (currently used) is the value of y at the one-dimensional 
minimum from the previous iteration y|_j^- 


Penalty Function Method 

The exterior penalty function method (refs. 1, 2, 5, and 24) selected for 
solving the constrained minimization problem is expressed mathematically by 


p(x) =f(x) + ^ g(x)'^K^g(x) + ih(x)^K 2 U(h)h(x) 


(23) 


where and K 2 are diagonal weight matrices , and U(h) is a diagonal matrix 
with diagonal elements , unit step functions U (h^.) . 

For positive elements of and K 2 . equation (23) implies that in minimiz- 
ing p(x) , f(x) is minimized but is penalized positively for violations in g. and 
h^ . Increasing each element of and K 2 causes further restriction on minimiz- 
ing f, and more emphasis is placed on decreasing the magnitudes of g and h. For 
increasingly large values of elements of and K 2 , the solution minimizing equa- 
tion (23) approaches the solution to the constrained minimization problem (ref. 1) . 
Comparison of the gradient of equation (23) at a minimum of equation (23) 

= fx ^ (Sx•^)PlS•K2'^0■)l']^ = 0 (24) 

and the equation governing the multipliers for the contrained solution 
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where X is a Lagrange multiplier, implies that for increasingly large values of 
elements of and Kg (ref. 1) , 


and 




'■li 


li 


( 26 ) 


h. 

1 


^2j 


(27) 


for the h^ that are tending to be satisfied by strict equality . 

In applications , the size of the elements of and Kg must be exploited 

with regard to computer computation limits. Also, each constraint is identified, 
and an acceptable violation, which will identify a threshold value for its element in 
K^ or Kg, is assigned to it. By assigning threshold values for the elements of K^^ 

and Kg, p(x) is minimized, and an acceptable, practical solution to the constrained 

minimization problem is obtained . For example, v^ 5 ^ 0 is a desired value for final 

^f 

speed . If the constraint for it is defined as g = - 1 , and it is desired that 

f 

Vj - Vj - 8Vj, then a reasonable value for its Kj^ diagonal element has been found to 


be 


-logios 


The penalty function of equation (23) is continuous , together with its first 
derivatives, if f, g, and h are continuous; but its second derivatives are dis- 
continuous for all x such that h. (x) = 0. Because p(x) contains violations in h 


and g at its minimum , an h^ that is tending toward zero will eventually do so from 

the positive side during the latter iterations of convergence , and continuous second 
derivatives will be created during this period. 


By using a method requiring gradients to minimize p (x) , either the closed- 
form computation approach or a numerical computation approach can be used . If a 
closed-form approach is taken, then from equation (23) , p is computed by 




(28) 
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where f , g , and h are computed separately and additional computer storage 

XX X 

is required . Obtaining f j g > and h in closed form is often very difficult , if 

not impossible , and requires storage and many auxiliary computations in solving 
the nonlinear optimization problems. Also, it is possible that there may be larger 
errors in computing p^ than in computing p because they are not computed from 

the same equations . Thus , there may be an inconsistency between p^ and p . 

However, consistency between p and p can be maintained by numerically com- 

puting p on the basis of differences of p . Because only computations of p are 

required , no additional storage and auxiliary computations are required . Thus , 
the penalty function technique becomes completely flexible because any additions or 
deletions of constraints are made by simply altering h or g without having to de- 
rive h or g . Several techniques using differences of p were exploited but 

proved to be no better than the forward and central difference quotients techniques. 
For 9p/9Xj, the forward and central difference techniques are defined, respec- 
tively , by 


9P 

9x. 




1 ’ 


, X. + Ax. , . . . , X \ - p (x) 

3 3 n; 

Ax. 

3 


( 29 ) 


and 



2Ax. 

3 


X- , . . . , X. - Ax X ) 

1 3 


(30) 


With respect to the Univac 1108 computer, using double-precision arithmetic, non- 
dimensional state variables , and normalized parameters , 


0 < Ix.l <10 

' r - 


(31) 


A suitable choice for Ax. 

3 


using forward or central differences is 


Ax. = 1(10) ®lx.| 


(32) 


This perturbation produces partials by using central differences that are accurate 
to approximately six digits . By using the DFP method , this accuracy was adequate 
to obtain near quadratic convergence in many instances . 
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For regions not in a neighborhood of the minimum, the forward difference 
technique can be used to compute partials accurate enough to produce convergence 
comparable to that obtained by using central differences , and only half the compu- 
tations are used in obtaining p^ (shown in eqs, (29) and (30)) . But in a neighbor- 
hood of the minimum , central differences are usually required . 


CHARACTERISTICS 


To begin the iterative process, and Xq are usually chosen by using 

all the available knowledge and information concerning the minimum. When a 
general problem such as an ascent problem is solved , information such as the di- 
agonal of a converged H matrix and a converged parameter vector x is often used 
to make initial guesses to start related ascent problems . The identity matrix is a 
popular candidate for H^, but many iterations are spent "creeping" toward the 

minimum while the H matrix is being manipulated to a point at which significant 
changes can be made in the parameter vector x. This is due to the fact that H, 
along with p , determines the directions of search (eq. (7)) on each iteration. If 

H is completely inappropriate for a given problem, which can happen on the initial 
guess , the search will not be sufficiently downhill , and short , inefficient searches 
will result. When H is the identity matrix, the direction of search is in the local, 
steepest descent direction . In most cases , this is little consolation since initial local 
characteristics soon change , and the search is abruptly terminated . 

With respect to a Univac 1108 computer, the DFP method has been used suc- 
cessfully in atmospheric flight optimization by applying double-precision arithmetic 
throughout the search; single-precision arithmetic has been tried, but convergence 
was very poor (ref. 22) , On a seven-parameter problem, a combination of single- 
and double-precision arithmetic was used with partial success . The author has 
found that the combination of using single-precision to compute p (x) and p and 
double-precision for the DFP method is most successful . ^ 

The DFP method does not require the costly matrix inversion computation re- 
quired by Newton's method and gradient projection. The method is stable if p(x) 
and p are computed accurately and consistently and H is kept positive definite . 

X 

If p (x) and p are computed accurately and if one-dimensional searches are 

X 

accurate , H will remain positive definite , and the method will maintain stability . 
Stability , which means that the direction of search s. on each iteration is guaran- 
teed to be one in which p decreases for positive y, is ensured by the positive 
definiteness of H, because 


dp(0) 

dy 



”iPx(*i) < 0. for Px ^ 0 


(33) 
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During the iteration process , it is beneficial to monitor the trend of the con- 
vergence. One obvious indicator is the change in the penalty function from itera- 
tion to iteration; however , precise trend information cannot be obtained because 
changes will be made at different , often erratic rates during the normal solution of 
a problem. Since there is usually no a priori information on the value of the penalty 
function at the minimum , it is difficult to determine when the solution is found . 

The numerical values of the elements of the gradient vector during the itera- 
tion also may indicate the efficiency of the iteration . This indication is particularly 
valuable for ascertaining when acceptable terminal convergence has been achieved 
because all elements will be zero at the minimum. The primary disadvantage in using 
this approach is that it requires constant surveillance of a rather large vector in 
which one element may be the indicator that convergence has not been achieved . 
Then , too , each element may not decrease monotonically , which makes iteration-to- 
iteration comparisons difficult. In addition to these difficulties, each element of the 
gradient vector has its own relative numerical magnitude . A value that may be 
significantly low for one element may not be sufficiently low for another gradient 
element . 

One of the best indications of the total convergence characteristics of a given 
problem is the comparison of the directional derivatives dp(0)/dy at the beginning 
of each iteration . Another indication of convergence quality is the comparison of 
dp(0)/dy and dp(y.)/dy for a given iteration. 

The directional derivative dp(0)/dy is a scalar computed from the relation- 
ship given in equation (33) . It can be interpreted as the derivative of the penalty 
function in the direction of the one-dimensional search evaluated at the origin of the 
search. If H remains positive definite throughout, the DFP formulation ensures that 
the directional derivative at the origin will increase on each successive iteration 
(stability) . If the derivative is rapidly increasing during initial iterations , it is 
reasonable to assume that a good estimation was made for H initially . If the deriv- 
ative increases slowly for many iterations , this usually indicates that H is being 
built. If dp/dy increases rapidly as the minimum is approached, this indicates 
that p is behaving quadratically and near quadratic convergence is possible. If 
the derivative has increased significantly but then starts to decrease slightly, either 
the one-dimensional search is not accurate enough or p^ is inaccurate in this re- 
gion. A bad one-dimensional search is often characterized by numerous evaluations 
of p , possibly indicating a region in which the function is obtaining noisy evalua- 
tions , which produce problems with one-dimensional search cutoffs . It could also 
mean that the one-dimensional search was initiated incorrectly (either too short or 
too long) and that the search to reach the minimum will be laborious . 

During the course of convergence, as the DFP method minimizes p as ex- 
pressed by equation (23) , the convergence seems to be divided into three regions. 
The division is indicated by both the directional derivative at the origin and the 
value of p . The first region is characterized by large decreases (sometimes two 
orders of magnitude) in p on successive iterations . Coinciding with these de- 
creases are rapid changes of dp/dy (y = 0) from one iteration to another. In this 
region, p is largely dominated by the last two terms of equation (23) . This con- 
dition implies that the DFP method is essentially satisfying the point constraints or 
boundary conditions . The passing of this region of convergence into the second 
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region of convergence is characterized by much smaller decreases in p from one 
iteration to the next , usually in the second or third digit of p . This result indicates 
that p is dominated by f(x). In the second region, f(x) is being minimized , and 
the boundary conditions are being controlled. Simultaneously, dp/dy (y = 0) 
changes very slowly and sometimes stays on approximately the same order of mag- 
nitude for many iterations . Toward the end of this region, p will begin to behave 
quadratically , and stability of the H matrix will be indicated by each matrix ele- 
ment remaining within approximately an order of magnitude of each element of 

p (x) ^ . The third region is characterized by rapid increases in dp/dy (y = 0) 

from one iteration to the next . During this region of convergence , p behaves like 
a quadratic function, and on the last iteration, a Newton step will occur with y == 1. 

This directional derivative should be zero at the minimum, but no general 
determination can be made on the smallest acceptable number for complete numerical 
convergence . This information is not useless, however, because it can be effectively 
used to give information on the total problem convergence characteristics . 

Another characteristic that measures the quality of an individual iteration is 
the directional derivative dp(y.)/dy , the derivative of the penalty function in the 

direction of the search evaluated after the completion of the search. This directional 
derivative is particularly useful when compared with its value at the origin 
dp(0)/dy . If a one-dimensional search can be assumed accurate but is indicated 
inaccurate by looking at dp(y.)/dy (theoretically should be zero) , then p (y.) = 

1 XX 

p (x. , is inaccurate because 
^x 1+1 


However, if 


dy 




dp(Yi) 

-4 

<10 

dp(0) 

dy 


dy 


(34) 


(35) 


on each iteration , this is usually a good indication that the one-dimensional search 
and p^ are being computed accurately enough to ensure near quadratic conver- 
gence . If this accuracy is not maintained , the iterative process may eventually de- 
generate before the minimum is reached . One temporary remedy , which may move 
the solution a little closer to the minimum , is a series of restarts of H , because H 
has probably gone near singular or indefinite from an accumulation of inaccuracies . 
But if equation (35) is maintained, a restart of H is not necessary. In fact, a re- 
start would cause further delay in reaching the minimum because H would have to 
reconverge to values near those at its point of restart. To avoid a restart of H when 
equation (35) is not satisfied, H is not updated according to equation (9) . This 
remedy is currently in the program . 
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As convergence nears the minimum 1 » this result is seen if equations 

(5) , (7) , and (8) are compared. This can be used as a terminal convergence 
indicator . 


CONCLUDING REMARKS 


The Davidon-Fletcher-Powell penalty function method that has been used to 
solve many problems has proved to be one of the most efficient methods for solving 
constrained minimization problems . Constrained minimization problems include 
many different problem definitions concerning parameter optimization , optimal 
control , boundary value , as well as iterator problems and many others . The 
Davidon-Fletcher-Powell method is a technique for iteratively solving unconstrained 
minimization problems; and, for this reason, it is sometimes referred to as an 
iterator . 

To solve problems with continuous state and control inequality constraints , 
the controls should be modeled parametrically, and the constraints should be handled 
either explicitly (whenever it is possible to maintain consistency) or by using an 
0 

error function or a similar method. These types of problems are usually defined 
through the calculus of variations. However, techniques for solving constrained 
minimization problems within that realm are usually so complex and inefficient that 
solutions are often unobtainable . 

The primary problems solved with the Davidon-Fletcher-Powell penalty func- 
tion method have been atmospheric and exoatmospheric flight optimization problems . 
In solving these problems , many methods were developed for defining and solving 
optimization problems by parametric methods using the Davidon-Fletcher-Powell 
penalty function technique . 
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